-
Notifications
You must be signed in to change notification settings - Fork 11
encode to utf-8 instead of latin-1 #15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅ see 5 files with indirect coverage changes 📢 Thoughts on this report? Let us know! |
|
Thank you very much for your PR! Unfortunately it fails to build, but there is hope, because more than half of the tests pass, so this suggests that with minor rewrites to the tests it may work fine again. I'll have a look. |
|
I did not reply before but I did look into this and this involved more than minor rewrites for the remaining tests to pass. I will eventually come back to it when I'll have time, and maybe AI is advanced enough now to actuallu be useful for this kind of bug fixing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR changes the character encoding from latin-1 to utf-8 in the b() compatibility function to enable support for Cyrillic and other non-ASCII characters in file names.
Key Changes:
- Updated
b()function to useutf_8_encodeinstead oflatin_1_encodefor Python 3 string-to-bytes conversion
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def b(x): | ||
| if isinstance(x, _str): | ||
| return codecs.latin_1_encode(x)[0] | ||
| return codecs.utf_8_encode(x)[0] |
Copilot
AI
Nov 13, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The encoding has been changed from latin-1 to utf-8 in the b() function, but the _bytes() function on line 66 still uses 'latin-1' encoding. For consistency and to properly support UTF-8 characters (like Cyrillic), this should also be updated to 'utf-8'.
Note that this change also requires updating corresponding decode('latin-1') calls elsewhere in the codebase (e.g., in header_ecc.py line 603 and structural_adaptive_ecc.py line 657) to use decode('utf-8') instead, to ensure proper round-trip encoding/decoding of file paths with non-ASCII characters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot open a new pull request to apply changes based on this feedback
to add support for e.g. Cyrillic characters in file names; untested, let's see if CI/CD passes :)
related to #13